104 research outputs found

    Apprentissage partiel de grammaires lexicalisées

    Get PDF
    International audienceSur le plan théorique, le modèle de Gold semble adapté à l'apprentissage des langues naturelles. Cependant la mise en pratique des algorithmes d'acquisition issus de ce modèle pose de nombreux problèmes. Nous développons dans cet article des résultats obtenus à la suite des travaux de Buszkowski, Penn et Kanazawa, qui ont montré que certaines classes de grammaires catégorielles sont apprenables. L'algorithme d'origine nécessite une grande quantité d'information en entrée pour être efficace. En changeant la nature des informations en entrée, nous proposons un algorithme d'apprentissage de grammaires catégorielles plus réaliste dans la perspective d'applications aux langues naturelles. Cette méthode peut être étendue à certains formalismes grammaticaux lexicalisés, comme les grammaires de liens. L'expérimentation que nous proposons avec ce formalisme tend à montrer la faisabilité de notre approche

    Partial Learning Using Link Grammars Data

    Get PDF
    International audienceKanazawa has shown that several non-trivial classes of cate- gorial grammars are learnable in Gold's model. We propose in this article to adapt this kind of symbolic learning to natural languages. In order to compensate the combinatorial explosion of the learning algorithm, we suppose that a small part of the grammar to be learned is given as in- put. That is why we need some initial data to test the feasibility of the approach: link grammars are closely related to categorial grammars, and we use the English lexicon which exists in this formalism

    LIPN UIMA Platform 0.1.* User/developer guide

    No full text
    This is a "work in progress" versionThis document describes the "LIPN UIMA Platform" software, developed in 2010 by the author at LIPN. This software mainly consists in a UIMA-based evolutive platform devoted to corpora annotation. It is divided into two modules (lipn-uima-core and lipn-nlptools-utils) and is distributed as an archive containing a environment providing a few tools (scripts, CPE descriptors, examples etc.). Currently the software is located at: http://www-lipn.univ-paris13.fr/~moreau/uima/lipn-uima-core.tg

    On learning discontinuous dependencies from positive data

    Get PDF
    International audienceThis paper is concerned with learning in the model of Gold the Categorial Dependency Grammars (CDG), which express discontin- uous (non-projective) dependencies. We show that rigid and k-valued CDG (without optional and iterative types) are learnable from strings. In fact, we prove that the languages of dependency nets coding rigid CDGs have finite elasticity, and we show a learning algorithm. As a standard corollary, this result leads to the learnability of rigid or k- valued CDGs (without optional and iterative types) from strings

    GDMS-R: A mixed SQL to manage raster and vector data

    Get PDF
    11pInternational audienceTo evaluate urbanization impact on territories, an accurate knowledge of the urban and peri-urban fabrics is unavoidable. To provide advanced characterization of the terrain, modern GIS applications target even wider geographic areas at finer resolutions but they also have to mix data of different types such as Digital Elevation Model (raster layer), buildings (polygonal layer) and roads (polylines layer). Processing both raster and vector data with the same semantic and in an efficient way presents significant challenges to GIS insofar as underlying granularities but also data layout and processing patterns might be absolutely different. We have already focused on the definition and the implementation of an abstraction layer called GDMS (Generic Datasource Management System) to handle and process vector data. Main objectives with GDMS, were to provide the user not only a simple and powerful API but also a spatial SQL derived language. Moreover, as an intermediate layer between the user and the information source, GDMS intends to reduce the coupling between the processes and the specificities of each underlying format. As a consequence, former work may easily be reused in a much larger set of scenarii. The learning curve is consequently even simpler. In this paper, we propose a raster extension to the GDMS layer called GDMS-R. Even if, there is currently no OGC standard concerning raster processing (using well-known SQL language), there already exists a de facto standard called Map Algebra defined by C. D. Tomlin in 1990 and commonly implemented in a wide set of GIS. Our objective is a bit different insofar as we propose to extend SQL language. We present the integration of Map Algebra concepts in GDMS through the GRAP (GeoRAster Processing) language. As for GDMS, reuse is enhanced by the possibility of being vendor-independent (middle-ware approach) and the extension capabilities of the underlying SQL language. To demonstrate the capabilities of GDMS-R, we present a use case relative to the deep impact of increased urbanization on the vulnerability of peri-urban hydro-systems: impact of the linear constraints on the runoff water pathways and accumulation that uses both vector and raster data in an unified way

    GDMS: An abstraction layer to enhance Spatial Data Infrastructures usability

    Get PDF
    15pInternational audienceThe practical exploitation of SDI (Spatial Data Infrastructures) raises number of issues as far as it grows. Among them is the heterogeneity of data sources and thus the difficulty for GIS users not to depend on the data source format and of course to learn different systems. This a major flaw with respect to reuse and data sharing. The purpose of our work is to propose a new semantic layer derived from the SQL language that is independent of the underlying data source. This layer, called GDMS (Generic Data source Management System) can first be seen as an abstraction layer between data sources and the SDI tools. We will also show how this layer extends both SQL and spatial semantics and improves the exploitation of the SDI, by providing feedback both in terms of work and data reuse. A simple example mixing heterogeneous data sources will be presented

    Annotation fonctionnelle de corpus arborés avec des Champs Aléatoires Conditionnels

    Get PDF
    National audienceL'objectif de cet article est d'évaluer dans quelle mesure les "fonctions syntaxiques" qui figurent dans une partie du corpus arboré de Paris 7 sont apprenables à partir d'exemples. La technique d'apprentissage automatique employée pour cela fait appel aux "Champs Aléatoires Conditionnels" (Conditional Random Fields ou CRF), dans une variante adaptée à l'annotation d'arbres. Les expériences menées sont décrites en détail et analysées. Moyennant un bon paramétrage, elles atteignent une F1-mesure de plus de 80%

    On learning discontinuous dependencies from positive data

    Get PDF
    International audienceThis paper is concerned with learning in the model of Gold the Categorial Dependency Grammars (CDG), which express discontin- uous (non-projective) dependencies. We show that rigid and k-valued CDG (without optional and iterative types) are learnable from strings. In fact, we prove that the languages of dependency nets coding rigid CDGs have finite elasticity, and we show a learning algorithm. As a standard corollary, this result leads to the learnability of rigid or k- valued CDGs (without optional and iterative types) from strings

    NeuMiss networks: differentiable programming for supervised learning with missing values

    Get PDF
    International audienceThe presence of missing values makes supervised learning much more challenging. Indeed, previous work has shown that even when the response is a linear function of the complete data, the optimal predictor is a complex function of the observed entries and the missingness indicator. As a result, the computational or sample complexities of consistent approaches depend on the number of missing patterns, which can be exponential in the number of dimensions. In this work, we derive the analytical form of the optimal predictor under a linearity assumption and various missing data mechanisms including Missing at Random (MAR) and self-masking (Missing Not At Random). Based on a Neumann-series approximation of the optimal predictor, we propose a new principled architecture, named NeuMiss networks. Their originality and strength come from the use of a new type of non-linearity: the multiplication by the missingness indicator. We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns. As a result they scale well to problems with many features, and remain statistically efficient for medium-sized samples. Moreover, we show that, contrary to procedures using EM or imputation, they are robust to the missing data mechanism, including difficult MNAR settings such as self-masking

    Recovery of Bennu's Orientation for the OSIRIS-REx Mission: Implications for the Spin State Accuracy and Geolocation Errors

    Get PDF
    The goal of the OSIRIS-REx mission is to return a sample of asteroid material from Near-Earth Asteroid (101955) Bennu. The role of the navigation and fight dynamics team is critical for the spacecraft to execute a precisely planned sampling maneuver over a specifically-selected landing site. In particular, the orientation of Bennu needs to be recovered with good accuracy during orbital operations to contribute as small an error as possible to the landing error budget. Although Bennu is well characterized from Earth-based radar observations, its orientation dynamics are not sufficiently known to exclude the presence of a small wobble. To better understand this contingency and evaluate how well the orientation can be recovered in the presence of a large 1 degree wobble, we conduct a comprehensive simulation with the NASA GSFC GEODYN orbit determination and geodetic parameter estimation software. We describe the dynamic orientation modeling implemented in GEODYN in support of OSIRIS-REx operations, and show how both altimetry and imagery data can be used as either undifferenced (landmark, direct altimetry) or differenced (image crossover, altimetry crossover) measurements. We find that these two different types of data contribute differently to the recovery of instrument pointing or planetary orientation. When upweighted, the absolute measurements help reduce the geolocation errors, despite poorer astrometric (inertial) performance. We find that with no wobble present, all the geolocation requirements are met. While the presence of a large wobble is detrimental, the recovery is still reliable thanks to the combined use of altimetry and imagery data
    • …
    corecore